Graph-GAP Reproducibility Dataset (v1)

This package contains all *derived* data used by the paper’s trial-run scoring.
To avoid copyright issues, it DOES NOT include full verbatim text from the UNICEF PDF or external PDFs.
Instead, each unit is referenced by:
- page anchor
- unit_rank (extraction order)
- text_sha256 (hash of the exact extracted sentence)
- text_len_chars (length in characters)

Files
1) units_coder_level.csv
   Per-unit scores from 3 parallel coders (A/B/C): E/M/G/K/Readiness + GapScore.

2) units_aggregated.csv
   Aggregated per-unit score + disagreement flag.

3) requirement_summary.csv
   Requirement-level aggregation with 95% bootstrap CI (GapScore mean) and Readiness P80 CI.

4) reliability_metrics.csv / weighted_kappa_pairs.csv
   Inter-rater reliability metrics.

5) external_proxy_signals.csv / external_proxy_summary.csv
   Proxy criterion signals from external governance/enforcement corpora (no text).

6) audit_log.csv / run_config.json
   Fingerprints, parameters, and extraction mode.

7) sample_unit_excerpts_20.csv
   OPTIONAL debugging sample only (20 units, <=12 words excerpt each).

How to re-derive the exact text units locally
- Use your local PDF copies with matching SHA256 (see audit_log.csv).
- Re-run the extraction + hashing procedure (see run_config.json) and match on text_sha256.

Scales
- E/M/G/K: 1..5 (higher = bigger gap)
- Readiness: 0..5 (higher = more implementable)
- GapScore = 0.25*(E+M+G+K)


[v2 update]
- Added reproduce.py + requirements.txt + QUICKSTART_CN.txt + SCHEMA.txt
